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Abstract 

We present a new framework for studying -words, that are arbitrarily many times different iable words 
over the alphabet S = {1,2}. After introducing an equivalence relation on C°°-words, whose classes are 
called minimal classes and represent all the -words, we define a vertical coding of these words based 
on a three-letter alphabet, and a set of functions operating over this representation. We show that the 
minimal classes of -words can be represented on an infinite directed acyclic graph, which, as well as 
all the functions introduced for the vertical coding, can be defined recursively with no explicit reference to 
-words. This new representation adequately expresses the combinatorial structure of -words, and 
brings new perspectives in the study of the Kolakoski word and its factors. 
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1. Introduction 

The Kolakoski word [TT] is the unique right infinite word /C over the alphabet S = {1,2} starting with 2 
and coinciding with its run-length encoding 

/C=22112 12212211211221 21121 ••• 

2 211212 212 21121... 

This mysterious self- generating word is far to be well understood, and several longstanding conjectures on 
its structure remain unproved. Kimberling [10] asked whether the Kolakoski word is recurrent and whether 
the set of its factors is closed under complement (swapping of I's and 2's). Dekking [6 observed that the 
latter condition implies the former, and introduced an operator on finite words, called the derivative^ that 
consists in discarding the first and/or the last run if these have length 1 and then applying the run- length 
encoding. For example, the derivative of 2122 is 12. The set of words that are derivable arbitrarily many 
times over S, denoted by C°°, is then closed under complement and reversal, and contains the set of factors 
of the Kolakoski word. Therefore, one of the most important open problems about the Kolakoski word is to 
decide whether all the words in occur as factors in the Kolakoski word: 

Conjecture 1.1. Fact{IC) = . 

Actually, the set C°° contains the set of factors of any right-infinite word over S having the property 
that an arbitrary number of applications of the run-length encoding still produces a word over S. Such 
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words are called smooth words [3l[T]. Nevertheless, the existence of a smooth word such that the set of its 
factors is equal to the whole set C°° is still an open question. 

Another renowned open problem is to decide whether the Kolakoski word (or any other smooth word) 
is recurrent (every factor appears infinitely often) or even uniformly recurrent (consecutive occurrences of 
the same factor appear with bounded gap). Should Conjecture be true, the Kolakoski word would be 
recurrent [g^. 

In addition to the aforementioned problems, there is a conjecture of Keane [9 stating that the frequencies 
of I's and 2's in the Kolakoski sequence exist and are equal to 1/2. Chvatal [5 showed that if these limits 
exist, they are very close to 1/2 (actually, between 0.499162 and 0.500838). 

Up to now, only few combinatorial properties of C°°-words have been established. Weakley [13] started 
a classification of C°°-words and obtained significant results on their complexity function. Carpi [4 proved 
that the set contains only a finite number of squares, and does not contain cubes (see also [12 and [2 ). 
This result generalizes to repetitions with gap, i.e., to the words in of the form uzu^ for a non-empty z. 
Indeed, Carpi [4 proved that for every A: > 0, only finitely many C°°-words of the form uzu exist with z not 
longer than k. In a recent paper [7', we proved that for any u e C°°, there exists a z such that uzu e C°°, 
and \uzu\ < for a suitable constant C. In the same paper, we proposed the following conjecture: 

Universal Glueing Conjecture. For any u^v e CT" , there exists z such that uzv e (T^ . 

Despite the Universal Glueing Conjecture being a weaker assumption than Conjecture |1.1[ it remains 
an open question. Its validity would imply, among other things, that for any integer n > 0, there exists a 
C°°-word containing as factors all the C°°-words of length n. 

Let w he di C°°-word. Recall that any word v such that w is the derivative of v is called a primitive of 
w. For example, the primitives of the word 21 are 2212, 1121, 12212 and 21121. The C°°-words having the 
property that each derivative is obtained from the shortest primitive are called minimal words. Analogously, 

-words having the property that each derivative is obtained from the longest primitive are called maximal 
words. Moreover, a C°°-word is single-rooted if its last non-empty derivative is a word of length one (that 
is, 1 or 2), or double-rooted if its last non-empty derivative is a word of length two (that is, 12 or 21). 

In this paper, we mostly focus on single-rooted minimal words. Indeed, we show in Theorem |2.3| that 
every C°°-word w of height k > (the height of a C°°-word w is defined as the least integer k such that 
the k-th derivative of w is the empty word) contains exactly one single-rooted minimal factor of height k 
(if w is single-rooted) or two single-rooted minimal factors of height k (if w is double-rooted). The first 
single-rooted minimal factor of height k appearing in a C°°-word w is called the minimal part of w. Thus, 
every C°°-word is an extension of its minimal part preserving the height. This allows us to consider classes 
of -words having the same minimal part (that we call minimal classes). 

We recall a framework for dealing with -words that we introduced in a recent paper [T, based on a 
three-letter alphabet. We define, for every C°°-word two words Uq and Vq over the alphabet = {0, 1, 2}, 
called respectively the left frontier and the right frontier of which code respectively the sequences of the 
first and the last symbols of the derivatives of w. The pair {Uo^Vq) uniquely determines and is called the 



vertical representation of w (Theorem 3.1). 



Minimal words do not have O's in their vertical representation, that is, are coded by left and right frontiers 
over the alphabet S = {1,2}. As a consequence, single-rooted minimal words (and hence minimal classes) 
are in bijection with S'^. 

We then define the functions Tg and which map the left frontier of a C°°-word into its right frontier. 
More precisely, for any U e T^'' ^ TsiU) is the word V g S'' such that (U^V) is the vertical representation of the 
single-rooted minimal word having U as left frontier, whereas Td{U) is the word V e S'' such that ([/, V) 
is the vertical representation of the double-rooted minimal word having U as left frontier. The functions Tg 
and Td are idempotent and therefore establish bijections of S'^. We also define the compositions = Fg oFc^ 
and n = Ffi o Fg, for which we are able to find a very compact recursive form (Corollary |4.3|), that can be 



defined independently from the context of C°°-words (Theorem 4.4). The functions F^ and F^^, and therefore 
6 and 11, can be naturally extended to words with O's, that is, words coding the frontiers of any C°°-word. 

We then show that any single-rooted minimal word u having left frontier U e T^"" and height k > has 
three extensions to the right of minimal length having height k + 1: the two single-rooted minimal words 
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having left frontier Ul and U2 respectively, and the word having left frontier UO. We show that the minimal 
part of this latter word is the single-rooted minimal word having left frontier Q(U)2 (Theorem 5.1). 

This allows us to define the graph of minimal classes G, whose set of nodes is S'' and every node has 
three outgoing edges, labeled respectively by 1, 2 and 0. For every non-empty /7 e S'^ there is an edge from 
U to Ul labeled by 1, an edge from U to U2 labeled by 2, and an edge from U to Q{U)2 labeled by 0. 

In [7 , we introduced an infinite state automaton for representing the (classes of) C°°-words. In fact, G 
is the graph of this automaton. As a consequence of the recursive formulae given for O and 11, we have that 
the graph G can be defined recursively with no explicit reference to -words. 

We end the paper by formulating conjectures on the graph G that imply conjectures on -words and 
on the Kolakoski word. As a consequence of our approach, these conjectures can be stated in the context 
of directed acyclic graphs or even in that of recursive functions, that is, independently from the context of 

-words. 

The paper is organized as follows. In Sec. |2] we gather the background on -words and we fix the 
notation. In Sec. [s] we introduce the frontiers of C°°-words and the vertical representation, whereas in Sec. 
[Uwe define the functions on the frontiers of a C°°-word and derive recursive formulae for them. In Sec. 
[5] we show how a C°°-word can be extended to the right into another C°°-word from the point of view of 
the vertical representation, and present the graph G of minimal classes of C°°-words. Finally, in Sec. [6j we 
discuss conclusion and open problems. 



2. C°° -words 

We fix the two- letter alphabet E = {1, 2}. A word over S is a finite sequence of letters from S. The length 
of a word w is denoted by \w\. The empty word has length zero and is denoted by e. The set of all words 
over S is denoted by S'^. The set of all finite words over S having length n is denoted by S^. The i + 1-th 
symbol of a word w is denoted by w[i]. So, we write a word w of length n > diS w = w[0]w[l]" 'w[n - 1]. 

Let w eT^"" . If w = uv for some u^v eT^"" ^ we say that u is a. prefix of w and v is a. suffix of w. A factor of 
w is a prefix of a suffix of w (or, equivalently, a suffix of a preffx). The reversal of w is the word w obtained 
by writing the letters of w in the reverse order. For example, the reversal of w = 11212 is w = 21211. The 
complement of w is the word w obtained by swapping the letters of i.e., by changing the I's in 2's and 
the 2's in I's. For example, the complement of w = 11212 isw = 22121. 

A right-infinite word over S is a non-ending sequence of letters from S. The set of all right-infinite words 
over S is denoted by S^. 

Let w he a word over S. Then w can be uniquely written as a concatenation of maximal blocks of 
identical symbols (called runs)^ i.e., w = xYxl^-'xl^^ with and ij > 0. The run-length 

encoding of noted A(il'), is the sequence of exponents Zj, i.e., one has A{w) = iii2"*^n- The run-length 
encoding extends naturally to right-infinite words. 

Definition 1. /^/ A right infinite word W e is called a smooth word over H if for every integer k > 
one has that A^(>V) is still a word over H. 

The operator A on right infinite words over S has two fixed points, namely the Kolakoski word 
/C = 221121221221121122121121221121121221221121221211211221221121--- 
and the word 1/C. 

In this paper, we focus on the set of factors of smooth words. We start by recalling some definitions. 
Definition 2. /^/ A word w eT^"" is differentiable if A{w) is still a word over S. 

Remark 1. Since S = {1, 2} we have that w is differentiable if and only if neither 111 nor 222 appear in w. 
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Definition 3. 16] The derivative is the function D ^T^"" defined on the differentiahle words by: 



£ 


ifA{w) = 


1 or w = e, 




ifAiw) = 


2x2 or A{w) = 2, 


x2 


if A{w) = 


1x2, 


2x 


ifAiw) = 


2x1, 


X 


ifA{w) = 


Ixl. 



In other words, the derivative D{w) is obtained from A{w) by erasing the first and/or the last symbol 
if they are equal to 1. 

Let A: > 0. A word w is k- differentiahle on S if D^{w) is defined. By Remark [l] a word w is k- 
differentiable if and only if for every < j < k the word (w) does not contain 111 nor 222 as factor. We 
use the convention that D^{w) = w. Clearly, if a word is /c-differentiable, then it is also j-differentiable for 
every < j < k. 

We denote by the set of /c-differentiable words, and by the set of words which are differentiable 
arbitrarily many times. Therefore, = Clj^yoC^. A word in is also called a C°°-word [6 . 

As a direct consequence of the definition, we have that the set and, for any A: > 0, the set C^, are 
closed under reversal and complement. 

Definition 4. ^13] The height of a -word is the least integer k such that D^{w) = e. 

Definition 5. /?/ Let w g of height k>0. The root of w is D^~^{w). Therefore, the root of w belongs 
to {1,2,12,21}. Consequently, w is single-rooted if its root has length one or double-rooted if its root has 
length two. 

Definition 6. JUl A primitive of a word w is any word w' such that D{w') = w. 

It is easy to see that any C°°-word has two, four or eight distinct primitives (actually, it has two primitives 
if it starts and ends with 1, eight primitives if it starts and ends with 2, and four primitive else). For example, 
the word ^ = 22 has eight primitives (1122, 21122, 11221, 211221, 2211, 12211, 22112, 122112), whereas the 
word w = 121 has only two primitives (121121, 212212). 

However, every -words admits exactly two primitives of minimal (maximal) length, one being the 
complement of the other. 

Definition 7. /?/ Let w be a (J^ -word of height k > 1. We say that w is minimal (resp. maximal^ if for 
every < j < k-2, (w) is a primitive of D^^^{w) of minimal (resp. maximal) length. The words of height 
k = 1 are assumed to be at the same time minimal and maximal. 

We can define minimal and maximal words on one side only, in the following way. 

Definition 8. /?/ A word w e CT" is left minimal (resp. left maximal^ if it is a prefix of a minimal (resp. 
maximal) word. Analogously, w is right minimal (resp. right maximal^ if it is a suffix of a minimal (resp. 
maximal) word. 

Clearly, a word is minimal (resp. maximal) if and only if it is both left minimal and right minimal (resp. 
left maximal and right maximal). 

Example 1. The word 2211 is minimal, since 2211 is a primitive of 22 of minimal length and 22 is a 
primitive of 2 of minimal length; the word 21221121 is maximal, since 21221121 is a primitive of 1221 of 
maximal length and 1221 is a primitive of 2 of maximal length; the word 2122112 is left maximal but not 
right maximal. Note that 2211 is a proper factor o/ 21221121 and that the two words have the same height 
and the same root. 



w 2211 2122112 21221121 

D{w) 22 122 1221 

D'^{w) 2 2 2 
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Weakley started a classifications of -words based on the extendability. Indeed, any C°°-word has 
arbitrary long left and right extensions in C°°. Indeed, for any C°°-word w at least one between Iw and 2w 
is a C°°-word. Analogously, at least one between wl and it; 2 is a C°°-word. 

Definition 9. ^13] A word w e CT" is left doubly extendable (resp. right doubly extendable^ if Iw and 2w 
(resp. wl and w2) are both in . Otherwise, w is left simply extendable (resp. right simply extendable^. 
A word w e is fully extendable if Iwl, lw2, 2wl and 2w2 are all in . 

It is worth noticing that a word can be at the same time left doubly extendable and right doubly 
extendable but not fully extendable. This is the case, for example, for the word w = 1. 

Note that, by definition, any C°°-word w can be extended to the left and to the right with simple 
extensions in a unique way, up to a left-and-right doubly extendable (but not necessarily fully extendable) 
word w^. 

Based on the previous definitions and on a result of Weakley ([l3]. Proposition 3) one can establish the 
following structural result. 

Theorem 2.1. /?/ Let w e C°° . The following conditions are equivalent: 

1. w is fully extendable (resp. w is left doubly extendable, resp. w is right doubly extendable). 

2. w is double-rooted maximal (resp. w is left maximal, resp. w is right maximal). 

3. w and all its derivatives (resp. w and all its derivatives longer than one) begin and end (resp. begin, 
resp. end) with two distinct symbols. 



Example 2. Consider the word w = 121. By Theorem \2.1\ w is left doubly extendable and right doubly 
extendable. Nevertheless, w is not fully extendable, since it is single-rooted. Indeed, the word 2w2 is not in 

cr. 

We report here a lemma given in [7 with a slightly different statement. Basically, it states that extending 
to the right by one letter a right maximal word (resp. extending to the left by one letter a left minimal 



word) results in a right minimal word (resp. a left minimal word). Recall that, by Theorem 2.1, a right 
(resp. left) maximal word is right (resp. left) doubly extendable. 

Lemma 2.2. /?/ Let w e Cf^ be a right maximal word (resp. a left maximal word). Then the words wl and 
w2 are right minimal words (resp. Iw and 2w are left minimal words). 

Conversely, if wx, x eT^, is a right minimal word (resp. xw is a left minimal word), then w is a right 
maximal word (resp. w is a left maximal word). 

Let us summarize the previous results. Let v be a C°°-word of height k > 0. Then we know that v 
can be extended, to the left and to the right, with simple extensions (therefore, in a unique way) up to 
reaching a word which is left doubly extendable and right doubly extendable (so, is a maximal word). 
Moreover, v has the same root and the same height as v. Now, two substantially different cases arise: if v is 
double-rooted^ then so is and therefore v is fully extendable, that is, the four words I'Ol, 1'02, 2vl and 2v2 
are all C°°-words. In particular, there is a choice of x^y eH such that the words xvy and xvy are minimal 
words of height A: + 2 and root 2, and the words xvy and xvy are double-rooted minimal words of height 
k + 1. If instead v is single-rooted^ then so is '0, and therefore v is not fully extendable. In fact, only three 
among the four words Ivl^ 1^)2, 2vl and 2v2 are C°°-words. In particular, there is a choice of x^y el] such 
that the word xvy is a minimal word of height A: + 1 and root 1. The words xvy and xvy are instead words 
of height k -\- 1 and root 2, but they are minimal on one side only, while the word xvy is not a C°°-worcQ 

Example 3. The word v = 1122 is a single-rooted minimal word of height 3. It extends to the single-rooted 
maximal word (of height 3) v = 12112212. Table^shows the extensions of v. 

Example 4. The word v = 1122122 is a double-rooted minimal word of height 3. It extends to the double- 
rooted maximal word (of height 3) v = 1211221221. Table^shows the extensions of v. 



^It is in fact a minimal forbidden word for C°°[7]. 
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V V XV XV vy vy 

D^{v) 1122 12112212 112112212 212112212 121122122 121122121 

L>^(i)) 22 1221 21221 11221 12212 12211 

D\v) 2 2 12 22 21 22 

D\v) 2 2 



x-Oy xvy xvy 



D'(v) 
D\v) 
D\v) 


1121122122 


2121122122 


1121122121 


2121122121 


212212 


112212 


212211 


112211 


121 


221 


122 


222 


D\v) 


1 


2 


2 





Table 1: Left and right extensions of the single- rooted word 1122. 





v 


V 


XV 


XV 


vy 


vy 


D\v) 
D\v) 
D\v) 
D\v) 


1122122 


1211221221 


11211221221 


21211221221 


12112212211 


12112212212 


2212 


12212 


212212 


112212 


122122 


122121 


21 


21 


121 


221 


212 


211 






1 


2 


1 


2 



xvy xvy xvy xvy 

112112212211 212112212211 112112212212 212112212212 

2122122 1122122 2122121 1122121 

1212 2212 1211 2211 

11 21 12 22 

2 2 



Table 2: Left and right extensions of the double-rooted word 112212. 
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One of the aims of this paper is to show that single-rooted minimal words are all one needs for dealing 
with -words. 

Theorem 2.3. Let w e C°° of height k>0. If w is single-rooted, then w contains exactly one single-rooted 
minimal factor u of height k. Ifwis double rooted, then w contains exactly two single-rooted minimal factors 
u and u' of height k. 

Theorem |2.3| allows us to state the following definition, which is fundamental for the rest of the paper. 

Definition 10. Let w e CT° of height k > 0. The first single-rooted minimal factor of height k appearing in 
w is called the minimal part ofw. 

Example 5. Let w = 21221211221221121^ which is a double-rooted word of height 4. Then, w contains two 
single-rooted minimal factors of height A: u = 2121122 and u' = 112212211. The word u is the minimal part 
ofw. 

u r/ 
w 2 1 2\ 2 1 2 1 1 2 2 A 2 2 11 2 1 
1 2 1 1 2 2 1 2 2 1 
12 2 12 

2 1/ 



Figure 1: The minimal part w of a word w. 



Corollary 2.4. Any CT° -word of height k > can be obtained from its minimal part by extensions, to the 
left and to the right, preserving the height k. 

The definition of minimal part induces an equivalence relation on the set of -words (defined by the 
property of having the same minimal part). We call a class of C°°-words w.r.t. this equivalence a minimal 
class. As a direct consequence of Theorem |2.3[ there is a one-to-one correspondence between minimal classes 
and single-rooted minimal words. We will show that there is also a one-to-one correspondence between the 
set of single-rooted minimal words and the set S'^. Consequently there are exactly 2^ single-rooted minimal 
words of height k (and so 2^ minimal classes of height k). 

Weakley [13] conjectured that for every /c > 0, any double-rooted maximal word (that is, by Theorem 



2.1, any fully extendable word) of height k is shorter than any double-rooted maximal word of height /c + 1. 



Weakley's conjecture can also be rephrased in terms of minimal words, i.e. it is equivalent to the following 

Conjecture 2.5. Any minimal word of root 2 and height k is shorter than any minimal word of root 2 and 
height /c + 1. 

Indeed, any minimal word of root 2 and height /c > 2 is of the form xwy^ with x^y e Y] and w is di 
double-rooted maximal word of height k -2. 

One can wonder whether Weakley's conjecture holds true when double-rooted minimal words are replaced 
with single-rooted maximal words, i.e., whether any single-rooted maximal word of height k is shorter than 
any single-rooted maximal word of height k + 1. Note that any minimal word of root 1 and height /c > 1 is 
of the form xwy^ with x^y and w is di single-rooted maximal word of height k -1. Therefore, one can 
equivalently wonder whether any minimal word of root 1 and height k is shorter than any minimal word of 
root 1 and height + 1. However, the answer to the above question is negative. For example, there exist 
minimal words w,w' of root 1 and height, respectively, 19 and 20, such that \w\ = 3858 and \w'\ = 3851. 

It is worth noticing that Weakley's conjecture is also equivalent to the following 
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Conjecture 2.6. Any double-rooted minimal word of height k is shorter than any double-rooted minimal 
word of height /c + 1. 

Indeed, as a consequence of Lemma [2^ xw^ x € S, is a minimal word of root 2 and height k + 1 if and 
only if xw is a double-rooted minimal word of height k. 



3. Vertical representation of C°°-words 

We recall here the definition of vertical representation of a C°°-word \7\. 

We define a function ^ for representing a C°°-word on a three- letter alphabet = {0, 1, 2}. This function 
is a generalization of the function $ considered in [3 , that associates to any C°°-word = i(;[0]i(;[l] [n-1] 
the sequence of the first symbols of the derivatives of that is, the function defined by = D'^{w)[0] 

for < i < /c, where k is the height of w. 

If one takes the first and the last symbol of each derivatives of a C°°-word that is, the pair ^{w)^^{w)^ 
one gets a representation of C°°-words that is not injective. For example, take the two C°°-words w = 2211 
and = 21121221. Then one has ^(w) = ^{w') = 222 and ^(w) = ^(w') = 122. In order to obtain an 
injective representation, we need an extra symbol. We thus introduce the following definition. 

We set So = {0, 1, 2}. We also set 

Sq"^ = {U gSq : U = s OT the first symbol of w is different from 0}. 

Clearly, c^^^. 

Definition 11. Let w = w[0]w[l]- ■ -win - 1] be a -word of height k > 0. The left frontier ofw is the word 
"^{w) e HJ"^ of length k defined by: '^{w)[0] = w[0] and for <i <k 



'^{w)[i] 



ifD'{w)[0] = 2 and D'-\w)[0] ^ D'-^{w)[l], 

I)*(k;)[0] otherwise. 



For the empty word, we set ^{e) ^ e. 

The right frontier of w is defined as "^{w). If U and V are respectively the left and right frontier ofw, 
we call U\V the vertical representation ofw. 

In other words, to obtain the left (resp. the right) frontier of w, one has to take the first (resp. the last) 
symbol of each derivative of w and replace a 2 by a whenever the primitive above is not left minimal (resp. 
is not right minimal). 

Example 6. Let w = 21221211221. We have: 



D^{w) 21221211221 

D^{w) 121122 

D^{w) 122 

D^{w) 2 



The word D^{w) - 122 is not a left minimal primitive of the word D^{w) - 2, and therefore ^{w)\?>\, the 
fourth symbol of the left frontier of w, is a 0; analogously, the word w = 21221211221 is not a right minimal 
primitive of D{w) = 121122^ and therefore '^{w)[l], the second symbol of the right frontier of w, is a 0. 
Hence, the vertical representation of w is '^{w)\'^{w) = 2110|1022. 

The following theorem is a direct consequence of the definition of vertical representation. 
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Theorem 3.1. JTH ^'^V ^ -word is uniquely determined by its vertical representation; that is, the map 
defined on by 

is injective. 

Remark 2. In what follows, uppercase letters (U^ V, ly, . . .) will denote vertical words, that are words over 
SJ"^ coding the (left or right) frontier of a Cf° -word; lowercase letters (u^v^w^ . . .) will still denote C°° -words. 

Remark 3. Ifw is a minimal word, then, by definition, the left and the right frontier of w are vertical words 
belonging to S''. That is, they do not contain 0. In particular, if w is a single-rooted (or a double-rooted) 
minimal word, then the left frontier univocally determines the right frontier and vice versa. 

For minimal words, the knowledge of one frontier is sufficient to reconstruct the word. For example, 
let U = U[0]U[l]"-U[k - 1] e H^. Then, the unique single-rooted C°°-word having U as left frontier is the 
word w such that D^{w) is the shortest primitive of D'^^^{w) which begins with the symbol /7[z], for every 
< i < /c. Analogously, one can construct the unique double-rooted minimal word having U as left frontier. 

Hence, we can state the following 

Proposition 3.2. There is a one-to-one correspondence between the set of single-rooted minimal words and 
S''. This correspondence is given by the map ^. 

Therefore, there are exactly 2^ single-rooted minimal words of height k. 

Analogously, the map ^ gives also a one-to-one correspondence between the set of double-rooted minimal 
words and H'', and therefore there are exactly 2^ double-rooted minimal words of height k. 

Clearly, if a smooth word contains all the single-rooted minimal words of height /c + 1 as factors, then 



it will also contain as factors all the C°°-words of height /c, by Theorem 2.3 So, in order to prove that a 
smooth word W over S contains as factors all the -words, it is sufficient to prove that W contains as 
factors all the single-rooted minimal words. 
We end this section with a conjecture. 

Conjecture 3.3. There exists a linear function f{k) such that any -word of height f{k) contains as 
factors all the single-rooted minimal words of height k. 



Should Conjecture 3.3 be true, any smooth word over S would contain all the C°° -words as factors and 



would be uniformly recurrent. In particular, this would be true for the Kolakoski word. 

4. Functions on the frontiers 

Let G S'^. Then, by Proposition |3.2[ U is the left frontier of a unique single-rooted minimal word w. 
We denote by Vs{U) the right frontier of w. Analogously, U is the left frontier of a unique double-rooted 
minimal word w' . We denote by T(i{U) the right frontier oi w' . 

Example 7. Let U = 2122. Then TsiU) = 2222 and Td{U) = 1221. The situation is depicted in Fig. [|[ 
Clearly, r^(U) = rj(U) = U for any U e S^ 



Lemma 4.1. For any U eE"" one has r^(/7) = Td(U). 
We also define the compositions 

and 

n^FrfoF,. 

Therefore, 6(n([/)) = U(e(U)) = U for any U e S^ 
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The functions 11 and allow us to describe, from the point of view of the vertical representation, how 
single-rooted minimal words can be extended, respectively to the left and to the right, into double-rooted 
minimal words of the same height. Let w e be a double-rooted minimal word and let g S'^ be its left 
frontier. Then w can be viewed as the overlap between two single rooted minimal words u and u' . The left 
frontiers of u and are then U and Q(U) respectively. Clearly, 11 is the inverse of 0. 

Example 8. Let U = 2122. Then 6(/7) = 1221 and U{U) = 1121. The situation is depicted in Fig. \^ 

U Q{U) TdiU) n(c/) U TsiU) 




Figure 3: The functions B and 11. 



Remark 4. Let k>0. Then Ts, T^, Q and IT are bijections ofTJ^. Moreover, Fg and are involutor'^ 

We now show that the functions Fg, F^i, and therefore H and 0, can be defined recursively and indepen- 
dently from the context of -words. 
We clearly have 

Lemma 4.2. Let U Then: 

1. F,([/l)^ F^oF, oF^([/)l 

2. F,(/72) = F^([/)2 



3. F,([/l)^F,oF,oF,oF,o F,(?7) 2 

4. Td{U2) = TdoTsoTdoTsoTd{U)l 

The lemma can be proved by simple observations on the compositions of the functions F^ and F^^ (see 
Fig. [4] for an example). 

From the recursive formulae for Fg and F^^ we can derive recursive formulae for H and 0. 



"^A map is involutory if the composition with itself is the identity map. 
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Corollary 4.3. Let g S^ Then: 



1. U{U1) = U{U)2 

2. n(c/2) = n2(c/)i 

3. elui) = e^lu)2 

4. gIu2) = G{U)1 

Note that the function 11 (or, analogously, the function O) can be defined recursively by the formulae 



given in Corollary 4.3, with no explicit reference to C°°-words. Once the function 11 has been so defined, 
one can derive a recursive definition of Tg and r^^. This allows us to give a very compact description of 
single-rooted minimal words, presented in the following theorem. 

Theorem 4.4. Let U Then: 



1. U{U1) = U{U)2 

2. ulu2) = U'^{U)l _ 

3. Ts{UX) = n {Ts{U)X), for any X 

4. rd{U)=U{Ts{U)) 

The theorem is a consequence of the recursive definition of 11 given in Corollary |4.3| and of simple 
arguments on the compositions of the functions over the frontiers (see Fig. [5|. 



Theorem |4.4[ together with Theorem 2.3, gives an essential representation of C°° -words. Most of the 



algorithmic operations on C°°-words can be implemented by means of the formulae given in Theorem 4.4 
This reduces significantly the space needed for storing a C°°-word, since the ratio between the height of a 
C°°-word and its length is logarithmic. 

The functions F^, F^^, and 11 can be easily extended to Sq"^, as described below. 

Fet Uq g SJ"^ be a non-empty word, and let u be the shortest single-rooted C°°-word having Uq as left 
frontier (hence is a right minimal word). We define the function F^ : SJ"^ i-^ S'^ by 

r,(c/o) = 

that is, Ts{Uo) is the right frontier of u. 

Let V be the shortest double-rooted C°°-word having Uq as left frontier (hence is a right minimal word). 
We define the function Td'-^o^ ^^"^ by 

that is, Frf(/7o) is the right frontier of v. 

Now we can define : SJ"^ i-^ S'^ and 11 : SJ"^ i-^ E'^ by setting 

e = r,ord, 

and 

n^F^oF,. 
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Remark 5. For any word Uq e , one has (U o e)(Uo) = (6 o U)(Uo) = T^(Uo) = r|([/o) = Ue^\ So U 
and 6 are not bijections on EJ"^ (neither are and T^). They are instead projections ofT^^^ on S''. 

Recursive formulae for F^, T^^ H and O as functions on words over HJ"^ are derived below. 
We clearly have 

Ts(e) = Td(e) = U(e) = e(e)=e. 

Lemma 4.5. Let Uq e Sq"^. Then: 

1. r,(/7oi) = i>r,orrf(/7o)i 

2. Ts{Uo2) = Td{Uo)2 

3. r,(/7oQ)- r^or,or^(?7o)2 

4. r^(^oi)-r^or,or^or,o r^(/7o) 2 

5. Td{Uo2) ^ o r, o o r, o r^(/7o)i 

6. TdjUoO) = (Td oT soTdO Ts){Td o o TdjUp))! 

7. U{Uol)=U{Uo)2 

8. U{Uo2)=U^Uo)l 

9. U(UoO)=U(Uo)l 

10. GjUol) = G ^Uo)2 

11. e(Uo2) = e( r^(Uo) )i = r,(r,(?7o))i 

12. 6(/7oO) = e(6(/7o))l 



5. Right extensions 

In this section, we describe how a C°°-word of height k can be extended to the right into a C°°-word of 
height k + 1. We do this from the point of view of the vertical representation. Indeed, an extension to the 
right of a -words eventually results into an extension of one letter of its left frontier when the height of 
the word becomes k + 1. We restrict our attention to single-rooted minimal words. An analogous approach 
can be applied to the extensions on the left and the right frontier. 

Let be a single-rooted minimal word with left frontier g E'^, and let /c > be the height of that is 
the length of U. 

What are the shortest -words of height A: + 1 having as a prefix? These are exactly the shortest C°°- 
words of height k + 1 whose left frontiers have [/ as a prefix. That is, they are the shortest C°°-words having 
left frontier, respectively, /71, U2 and UO. We call these three words, respectively, the right 1-extension, 
right 2-extension and the right 0-extension of the word u. 
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Note that the right 1-extension and the right 2-extension are single-rooted minimal words, whereas the 
right 0-extension is not minimal, since its left frontier does not belong to E'^. So, what is the single-rooted 
minimal word corresponding to the right 0-extension of u7 That is, what is the minimal part of the right 
0-extension of u7 The following theorem provides the answer to this question. 

Theorem 5.1. Let u be a single-rooted minimal word of height k > 0, and /7 g its left frontier. The word 
u can be extended to the right into two single-rooted minimal words of height k + 1: the right 1-extension of 
u, which is the single-rooted minimal word having left frontier Ul, and the right 2-extension of u, which is 
the single-rooted minimal word having left frontier U2; and into a single-rooted (not minimal) word having 
left frontier UO, called the right 0-extension ofu. 

Moreover, the minimal part of the right 0-extension of u is the single-rooted minimal word having left 
frontier Q{U)2. 

Remark 6. The right 0-extension and the right 1-extension of a word u are words that differ only on the 
last letter. 

Example 9. Let u = 2121122^ having left frontier U = 2122. The right 2-extension of u is the word 
212112212212 having left frontier U2; the right 1-extension ofu is the word 212112212211212 having left 
frontier Ul; the right 0-extension ofu is the word 212112212211211 having left frontier UO, whose minimal 
part is the word 112212211211 having left frontier 0(/7)2. The situation is depicted in Fig. 




Figure 6: The right 2-extension, the right 1-extension and the right 0-extension of the single-rooted minimal word u = 2121122. 



The left 1-extension, the left 2-extension and the left 0-extension are defined analogously. 
We can now define an infinite directed acyclic graph for representing the minimal classes of -words 
accordingly with the three right extensions (see Fig. [t]). 




Figure 7: The minimal classes of the right extensions with B. 



The graph of -words [7^ is the graph G is defined by 
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where V = S'^ and the set £ of labeled edges is partitioned into three subsets: 

• £i = {(/7,1,/71), solid edges} 

• £2 = {{U,2,U2), solid edges} 

• ^0 = {(([/), 0,6(/7)2), dashed edges} 

Thus, G is obtained by adding to an infinite complete binary tree (with edge labels in E and node labels 
in S'') one additional edge outgoing from each node (labeled by 0). 

A partial diagram of the graph G is depicted in Fig. |9] The edges in £q are dashed. 




Figure 8: The minimal classes of the right extensions with 11. 

Clearly, one can define the graph G by means of 11 instead of B (see Fig. [8|, by defining 

G' = {V,£'), 

where V = S'^ and the set £^ of labeled edges is partitioned into three subsets: 

• £[ = {{U,1,U1), solid edges} 

• = 2, /72), solid edges} 

• = {(n(/7),0,/72), dashed edges} 
Since = over S'^, one has G = G' . 

Remark 7. ^4^ a consequence of Theorem \4'4\ the graph G can be defined recursively and with no explicit 
reference to -words. 

Let be a C°°-word and {Uq^Vq) its vertical representation, /7o,Vb e SJ"^. Let u be the minimal part 
of w and {U^Vs{U)) the vertical representation oi U e S''. If w is double-rooted, then w contains a 
double-rooted minimal factor u' having vertical representation ([/,r^ ([/)), see Fig. [l] 

By Theorem |2.3[ the word tL' is a simple extension on the left of and it is a simple extension on the 
right of u (if w is single-rooted) or of u' (if w is double-rooted). 

The left frontier Uq oi w is the label of a path in G starting at the origin and ending in U . Analogously, 
the right frontier Vq of w is the label of a path in G starting at the origin and ending in Vs{U) (if w is 
single-rooted) or in V(i{U) (if w is double-rooted). 
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Figure 9: The graph G cut at height 4. 
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Figure 10: The paths in G which start at the origin and arrive in the node 2122 are 2110, 1002, 2202 and 2122. They code the 
left frontiers of the simple extensions to the left of any word having left frontier 2122. 



Conversely, any word Uq e Sq"^ labeling a path in G starting at the origin and ending inU g S'^ is the 
left frontier of a simple extension on the left of any C°°-word having U as left frontier, and, symmetrically 
it is also the right frontier of a simple extension on the right of any C°°-word having U as right frontier. An 



example is given in Fig. 10 



Thus, we have the following characterization of the minimal classes of C°° words by means of the graph 

G. 

Theorem 5.2. Let U e S'' he the left frontier of a single-rooted minimal word u. The -words having u 
as minimal part are exactly the words w having vertical representation (/7o, Vb) such that Uq labels a path 
in G starting at the origin and ending in U , and Vb labels a path in G starting at the origin and ending in 
Ts{U) (if w is single-rooted) orTd{U) (if w is double-rooted). 

In particular, the number of distinct paths in G starting at the origin and ending in U e S'' is equal to 
the number of simple extensions on the left (resp. on the right) of a single-rooted minimal word having U 
as left (resp. right) frontier. 

This allows us to show that there exists a close relation between the paths in G and the length of the 
single-rooted minimal words. 

Let \U\ denote the length of the single-rooted minimal word u having left frontier U and \\U\\ the number 
of distinct paths in G starting at the origin and ending in U. 

From the definition of the graph G (see Fig.[8|, one has \\U1\\ = \\U\\ and \\U2\\ = \\U\\ + ||n(/7)||. 

Since \\U\\ is the number of simple-extensions to the left of li, we have that the length of the left 2- 
extension of u (that is, the single-rooted minimal word having left frontier n(/7)2) is equal to \U\ + \\U\\ (see 
Fig. |6|. Symmetrically, the length of the right 2-extension of u (which is the single-rooted minimal word 
having left frontier U2) is equal to |r5([/)| + ||r5([/)||. Analogous considerations hold for the (left and right) 
1-extensions. Note also that one has \U\ = |r,([/)|, |e(/7)| = \Td(U)l \U2\ = \U2\ and \U1\ = \U0\. 

We therefore deduce the following recursive formulae: 

1. |c/2| = |c/| + l|r.([/)|| = |e(f/)| + ||e([/)|| 

2. |c/i| = |©(c^)l + lie(c/)|| + ||r,(c/)|| 

We think that these formulae show the interest of dealing with the graph G when considering problems 
on the C°°-words. For example, analogous formulae can be applied to compute the number of I's and 2's 
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Figure 11: A conjecture on the graph G implying the Universal Glueing Conjecture. 



U Z V 




Figure 12: Conjecture on the existence of a word uzv g C°° for any u,v ^ C° 



in a C°°-word, thus giving a possible direction to investigate Keane's conjecture on the frequency of letters 
in the Kolakoski word. 



6. Conclusion and open problems 

The vertical representation is a compact representation of -words that allows one to represent any 
C°°-word of length n by means of two words whose length is logarithmic in n (the frontiers). In this paper, we 
showed that this representation can be defined in terms of two simple recursive functions, that are naturally 
represented by a directed acyclic infinite graph G. The recursive definitions of the functions on the frontiers 
leads to a recursive definition of the graph G, therefore independent from the context of C°°-words. 

Besides being more compact, we believe that the new representation presented here will allow the use 
of results from graph theory or poset theory in the study of the Kolakoski word. As an illustration, we 
formulate below two new conjectures on the graph G which, if proven, would imply the validity of important 
conjectures on C°°-words. 

• We conjecture that for any /7i,/72 e S'^ it is possible to find two words Vi, 1^2 ^ such that UiVi and 



U2V2 label two paths in G starting at the origin and ending in the same node (see Fig. 11 ). This would 
imply that for any u^v e there exists z such that uzv e (see Fig. 12). This latter conjecture 
was referred to as the Universal Glueing Conjecture [7 . 
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• We also conjecture that there exists a hnear function f{k) such that for every word Z e H-^^^) and 
every word g S^, there exists a word V e SJ"^ such that UV labels a paths in G starting at the origin 
and ending in Z. This would imply that any smooth word is uniformly recurrent. In particular, this 
would be true for the Kolakoski word. 

The key step underlying the construction presented here is the recognition of the fundamental role of 
minimal words (and associated classes) in the structure of -words. The simplicity of the representation 
of -words in terms of their vertical representation strongly suggests that these are at the heart of the 
structure of the entire set. This is supported by the fact that, quite surprisingly, also the study of the 
densities of I's and 2's in a C°°-word can be carried out using combinatorial properties of the graph G. 
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